y

p

y

q

The SOM map constructed based on the 3-mer data of 58,897 SARS-CoV2

from four countries. The letters a, b, c and d were used to stand for USA, India,

Brazil, respectively. They were coloured in red, blue, green and black in this

der to examine whether the SOM map truly reflected the data

of the sequences, the fitness of the SOM neuron occupancy

ges to the percentage distribution of the sequences from different

in the whole data was tested. Table 7.22 shows the test result.

square test p value of two percentages was 0.996 meaning that

entage distributions were almost identical. Therefore, the SOM

ruly reflected the genomic deviation of these sequences from four

. In addition, 53 neurons were mixed by sequences from more

country. In total, 1,355 sequences (2.3%) were mapped onto these

ns. This error rate also implied that this SOM map has well-

the original genomics pattern or structure hidden in these

s from four countries.